**************************************************************************************
For:			"The White/Black Educational Gap, Stalled Progress, and the Long-term
				Consequences of the Emergence of Crack Cocaine Markets"
By: 			William N. Evans, Craig Garthwaite and Timothy J. Moore
**************************************************************************************

**************************************************************************************
*********************** CODE FOR CALCULATING MURDER RATES AND ************************
************* ESTIMATING WHEN CRACK COCAINE ARRIVED IN DIFFERENT MSA/STATES **********
**************************************************************************************

This code creates SAS files of the public use Multiple Cause of Death data created by 
the National Center for Health Statistics and held by the National Bureau of Economic
Research. The necessary information is available from 1973 to 2002.

There are two public-use files that are too large to include in these files, but that
can downloaded easily:
1) To create the mortality files, download the SAS extract codes, the raw data files 
and the codebooks, go to the NBER website:
	www.nber.org/data/vital-statistics-mortality-data-multiple-cause-of-death.html
The files created are named in the following way: mort.s73 is the full extract of 
the mortality data for 1973.

2) To create the single-year-of-age county-level population data, extract it using the zipped
version in the folder or download it from the Cancer SEER website:
	http://seer.cancer.gov/popdata/download.html
Note: You need to manually change the file path in the code below (search for "<insert
file path>\us.1969_2014.singleages.adjusted.txt"). The name of the file may also change
also the variable names, location and the data for the sample period will remain the
same.

*** Notes about the mortality data:

There are many issues related to changes in the way NCHS has coded variables across 
the years, especially in terms of how geography is defined and if county identifiers
are available. County of occurrences is identified throughout 1973-1988. In 1989 and 
beyond, only large counties are identified: 465 in 1989-1993, 507 in 1994-1998, and 
454 of these are consistent across both periods. MSA coding changes in 1990 and 1994.
Our strategy was to get consistent MSA codes from 1990 to 1998, which means dealing 
with one version change in MSAs, then use the complete county codes from 1973 to 1988
to get a data series is available for all years except 1989, then limit the sample to
MSAs that overlap with the MSA coding that is specific to 1989.

There are also several inconsistencies across the years when they need to be used as 
one combined dataset:

1. Year variables change, so a new year variable was created before a merge occurred

2. The 3-category race recode changes in 1979:
	1973-78: 1-White, 2-Black, 3-Other race
	1979-88: 1-White, 2-Other race, 3-Black

These are full extracts: some of the variables are not needed in the analysis and are 
dropped in subsequent analyses.
;

libname mort "<insert library name filepath here>";

*** Inputting the murder data (note: first create these files using NBER extracts - details above);
data a73; set mort.s73; yr=1973; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a74; set mort.s74; yr=1974; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a75; set mort.s75; yr=1975; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a76; set mort.s76; yr=1976; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a77; set mort.s77; yr=1977; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a78; set mort.s78; yr=1978; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a79; set mort.s79; yr=1979; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a80; set mort.s80; yr=1980; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a81; set mort.s81; yr=1981; if ucr34='360'; keep staters countyrs popsize yr sex racer3 age;
data a82; set mort.s82; yr=1982; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age; 
data a83; set mort.s83; yr=1983; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a84; set mort.s84; yr=1984; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a85; set mort.s85; yr=1985; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a86; set mort.s86; yr=1986; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a87; set mort.s87; yr=1987; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a88; set mort.s88; yr=1988; if ucr34='360'; cntyfips=fipsctyr/1; keep staters cntyfips popsize yr sex racer3 age;
data a89; set mort.s89; yr=1989; if ucr34='360'; msafips=fipssmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a90; set mort.s90; yr=1990; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a91; set mort.s91; yr=1991; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a92; set mort.s92; yr=1992; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a93; set mort.s93; yr=1993; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a94; set mort.s94; yr=1994; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a95; set mort.s95; yr=1995; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsctyr/1; keep staters cntyfips msafips popsize yr sex racer3 age;
data a96; set mort.s96; yr=1996; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters cntyfips msafips popsize yr sex racer3 age;
data a97; set mort.s97; yr=1997; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters cntyfips msafips popsize yr sex racer3 age;
data a98; set mort.s98; yr=1998; if ucr34='360'; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters cntyfips msafips popsize yr sex racer3 age;
data a99; set mort.s99; yr=1999; if ucr39=41; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters fipsctyr msafips popsize yr sex racer3 age;
data a00; set mort.s00; yr=2000; if ucr39=41; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters fipsctyr msafips popsize yr sex racer3 age;
data a01; set mort.s01; yr=2001; if ucr39=41; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters fipsctyr msafips popsize yr sex racer3 age;
data a02; set mort.s02; yr=2002; if ucr39=41; msafips=fipspmsa/1; cntyfips=fipsstr*1000+fipsctyr; keep staters fipsctyr msafips popsize yr sex racer3 age;
run;
*** Cleaning up the mortality data;
data a7302;
set a73-a99 a00-a02;
if staters>51 then delete;  *removing territories;
*race recoding;
if racer3=1 then race=1; *white;
if 1973<=yr<=1978 & racer3=2 then race=2; *black;
if 1979<=yr<=2002 & racer3=3 then race=2; *black;
if 1973<=yr<=1978 & racer3=3 then race=3; *other race;
if 1979<=yr<=2002 & racer3=2 then race=3; *other race;
*age recoding to create age groups;
agegrp=14;
if age=999 then agegrp=.; *missing values;
if (15<=age<=19) then agegrp=1519;
if (20<=age<=24) then agegrp=2024;
if (25<=age<=39) then agegrp=2539;
if (40<=age<=150) then agegrp=4099;
keep yr staters sex race agegrp popsize;

*** Inputting the population data;
data pop;
infile '<insert file path>\us.1969_2014.singleages.adjusted.txt' LRECL=30;
input
yr	 	 1-4
stname	$5-6
stfips	 7-8
cntyfips 7-11
race	 14
sex		$16
age		 17-18
pop		 19-26;

*** Cleaning up the population data;
data pop2;
set pop;
if 1973<=yr<=2002;
*age recoding to create age groups;
agegrp=14;
if age=999 then agegrp=.; *missing values;
if (15<=age<=19) then agegrp=1519;
if (20<=age<=24) then agegrp=2024;
if (25<=age<=39) then agegrp=2539;
if (40<=age<=150) then agegrp=4099;
run;

*******************************************************
**** Figure 2 Expectations of Death by Age 30 for a ***
*** Black Males at Age 15, Based on Contemporaneous ***
************* Age-Specific Mortality Rates ************
*******************************************************;

*** Mortality data;
data x80; set mort.s80; yr=1980; if countyrs in ('09001' '19036' '26096' '44057'); 
data x81; set mort.s81; yr=1981; if countyrs in ('09001' '19036' '26096' '44057'); 
data x82; set mort.s82; yr=1982; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113); 
data x83; set mort.s83; yr=1983; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113); 
data x84; set mort.s84; yr=1984; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x85; set mort.s85; yr=1985; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x86; set mort.s86; yr=1986; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x87; set mort.s87; yr=1987; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x88; set mort.s88; yr=1988; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x89; set mort.s89; yr=1989; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x90; set mort.s90; yr=1990; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x91; set mort.s91; yr=1991; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x92; set mort.s92; yr=1992; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x93; set mort.s93; yr=1993; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x94; set mort.s94; yr=1994; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x95; set mort.s95; yr=1995; cntyfips=fipsctyr/1; if cntyfips in (11001 22071 29510 48113);
data x96; set mort.s96; yr=1996; cntyfips=fipsstr*1000+fipsctyr; if cntyfips in (11001 22071 29510 48113); 
data x97; set mort.s97; yr=1997; cntyfips=fipsstr*1000+fipsctyr; if cntyfips in (11001 22071 29510 48113); 
data x98; set mort.s98; yr=1998; cntyfips=fipsstr*1000+fipsctyr; if cntyfips in (11001 22071 29510 48113); 
data x99; set mort.s99; yr=1999; cntyfips=fipsstr*1000+fipsctyr; if cntyfips in (11001 22071 29510 48113); 
data x00; set mort.s00; yr=2000; cntyfips=fipsstr*1000+fipsctyr; if cntyfips in (11001 22071 29510 48113); 

data x8081; 
set x80 x81;
if countyrs='09001' then cntyfips=11001; *District of Columbia;
if countyrs='19036' then cntyfips=22071; *New Orleans;
if countyrs='26096' then cntyfips=29510; *St. Louis City;
if countyrs='44057' then cntyfips=48113; *Dallas TX;

data x8000; 
set x8081 x82-x99 x00;
if sex=1; *males;
if racer3=3; *black;
if 15<=age<=29; age2=age/1;
if 1980<=yr<=1998 & ucr34='360' then murder=1; 
if 1999<=yr<=2000 & ucr39=41 then murder=1; 
keep yr cntyfips age2 murder;
rename age2=age;
proc sort; by yr cntyfips age;
proc means noprint nway; 
class yr cntyfips age;
output out=x8000b N(yr)=deaths sum(murder)=murders;

*** Population data;
data xpop;
set pop2;
if 1980<=yr<=2000;
if 15<=age<=29;
if sex=1; *males;
if race=2; *black;
if cntyfips in (11001 22071 29510 48113);
keep yr cntyfips age pop;
proc sort; by yr cntyfips age;

proc means noprint nway; 
class yr cntyfips age;
output out=xpopb sum(pop)=pop;

*** Calculating rates;
data a;
merge x8000b xpopb;
by yr cntyfips age;
dthrate=deaths/pop;
if dthrate=. then dthrate=0;
murrate=murders/pop;
if murrate=. then murrate=0;
proc sort; by cntyfips yr;

*** Cumulative probabilities for all deaths;
proc transpose data=a out=b prefix=age;
by cntyfips yr;
id age;
var dthrate;
run;

data c;
set b;
deathprob=
+age15
+age16*(1-age15)
+age17*(1-age15)*(1-age16)
+age18*(1-age15)*(1-age16)*(1-age17)
+age19*(1-age15)*(1-age16)*(1-age17)*(1-age18)
+age20*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)
+age21*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)
+age22*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)
+age23*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)
+age24*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)
+age25*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)
+age26*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)
+age27*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)
+age28*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)*(1-age27)
+age29*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)*(1-age27)*(1-age28);
keep cntyfips yr deathprob;
proc export data=c outfile='C:\Users\tmoore\Desktop\crack\sas\cumulative_rates_death.csv' replace;
run;

*** Cumulative probabilities for murders;
proc transpose data=a out=b prefix=age;
by cntyfips yr;
id age;
var murdrate;
run;

data c;
set b;
murdprob=
+age15
+age16*(1-age15)
+age17*(1-age15)*(1-age16)
+age18*(1-age15)*(1-age16)*(1-age17)
+age19*(1-age15)*(1-age16)*(1-age17)*(1-age18)
+age20*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)
+age21*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)
+age22*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)
+age23*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)
+age24*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)
+age25*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)
+age26*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)
+age27*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)
+age28*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)*(1-age27)
+age29*(1-age15)*(1-age16)*(1-age17)*(1-age18)*(1-age19)*(1-age20)*(1-age21)*(1-age22)*(1-age23)*(1-age24)*(1-age25)*(1-age26)*(1-age27)*(1-age28);
keep cntyfips yr murdprob;
proc export data=c outfile='C:\Users\tmoore\Desktop\crack\sas\cumulative_rates_murders.csv' replace;
run;


********************************************************
**** Figure 4 Murder Rates and Prison Intake Rates, ****
*** Multiple Cause of Death and National Corrections ***
********* Reporting Program Data, 1980-2000 ************
********************************************************;

******** Panel A: Murder Rates for Various Age Groups *********;

*** Mortality counts;
data natmurders;
set a7302;
* restricting to year range in the figure;
if 1980<=yr<=2000;
*combining the 15-19 and 20-24 groups to 15-24;
if agegrp=1519 then agegrp=1524;
if agegrp=2024 then agegrp=1524;
proc means data=natmurders noprint nway; 
class yr agegrp;
output out=a1 N(yr)=murders;

*** Population counts;
data natpop;
set pop2;
* restricting to year range in the figure;
if 1980<=yr<=2000;
*combining the 15-19 and 20-24 groups to 15-24;
if agegrp=1519 then agegrp=1524;
if agegrp=2024 then agegrp=1524;
proc means data=natpop noprint nway; 
class yr agegrp;
output out=a2 sum(pop)=population;

data b;
merge a1 a2;
by yr agegrp;
murdersper100k=murders/population*100000;

proc transpose data=b out=c prefix=agegrp;
by yr;
id agegrp;
var murdersper100k;
proc export data=c outfile='C:\Users\tmoore\Desktop\crack\sas\fig4a.csv' replace;
run;


******** Panel B: Murder Rates for Those Aged 15-24, By Race and Sex **********;

*** Mortality counts, age 15-24;
data natmurders2;
set natmurders;
if agegrp=1524;
if sex in (1 2);
if race in (1 2);
if sex=1 then male=1; else male=0;
if race=2 then black=1; else black=0;

proc means data=natmurders2 noprint nway; 
class yr male black;
output out=a1 N(yr)=murders;

*** Population counts;
data natpop2;
set natpop;
if agegrp=1524;
if sex=1 then male=1; else male=0;
if race=2 then black=1; else black=0;
proc means data=natpop2 noprint nway; 
class yr male black;
output out=a2 sum(pop)=population;

data b;
merge a1 a2;
by yr male black;
murdersper100k=murders/population*100000;

proc transpose data=b out=c prefix=maleblack;
by yr;
id male black;
var murdersper100k;
proc export data=c outfile='C:\Users\tmoore\Desktop\crack\sas\fig4b.csv' replace;
run;


********** Panel C: Change in Murder Counts of Black Males since 1980,***********
******************** Aged 15-24, By Area Population Size ************************;

*** Mortality counts, black males age 15-24;
data natmurders3;
set natmurders2;
if agegrp=1524;
if male=1;
if black=1; 
*getting consistent population size categories;
if popsize='Z' then delete;
citysize=0;
if 0<=popsize<=1 then citysize=500000;
if    popsize= 2 then citysize=250000;
if    popsize= 3 then citysize=100000;

proc means data=natmurders3 noprint nway; 
class yr citysize;
output out=a1 N(yr)=murders;
proc sort data=a1; by citysize;

data a2;
set a1;
if yr=1980;
rename murders=murders1980;
drop yr;
proc sort; by citysize;

data b;
merge a1 a2;
by citysize;
murdersto1980=murders/murders1980;
proc sort; by yr;

proc transpose data=b out=c prefix=citysize;
by yr;
id citysize;
var murdersto1980;
proc export data=c outfile='C:\Users\tmoore\Desktop\crack\sas\fig4c.csv' replace;
run;


****************************************************************
*** State-based Murder Rates for Ages 20-24, by Sex and Race ***
*****************************************************************;

* Create murder rates that are the primary independent variable of interest 
for Tables 5, 6 and A7;

*** Murder counts by state of residence, year, sex and race;
data stmurd1;
set a7302;
if agegrp=2024; *age restriction of 20-24;
if sex in (1 2);
if race in (1 2);

proc means noprint nway; 
class staters yr sex race;
output out=stmurd2 N(yr)=murders;
proc sort data=stmurd2; by staters yr sex race;
run;


*** Population counts;
data stpop1;
set pop2;
*combining the 15-19 and 20-24 groups to 15-24;
if agegrp=1519 then agegrp=1524;
if agegrp=2024 then agegrp=1524;
if agegrp=1524;
*sample restrictions;
if sex in (1 2);
if race in (1 2);
* matching the population data to NCHS codes;
if stname = 'AL' then staters = '01'; if stname = 'AK' then staters = '02';
if stname = 'AZ' then staters = '03'; if stname = 'AR' then staters = '04';
if stname = 'CA' then staters = '05'; if stname = 'CO' then staters = '06';
if stname = 'CT' then staters = '07'; if stname = 'DE' then staters = '08';
if stname = 'DC' then staters = '09'; if stname = 'FL' then staters = '10';
if stname = 'GA' then staters = '11'; if stname = 'HI' then staters = '12';
if stname = 'ID' then staters = '13'; if stname = 'IL' then staters = '14';
if stname = 'IN' then staters = '15'; if stname = 'IA' then staters = '16';
if stname = 'KS' then staters = '17'; if stname = 'KY' then staters = '18';
if stname = 'LA' then staters = '19'; if stname = 'ME' then staters = '20';
if stname = 'MD' then staters = '21'; if stname = 'MA' then staters = '22';
if stname = 'MI' then staters = '23'; if stname = 'MN' then staters = '24';
if stname = 'MS' then staters = '25'; if stname = 'MO' then staters = '26';
if stname = 'MT' then staters = '27'; if stname = 'NE' then staters = '28';
if stname = 'NV' then staters = '29'; if stname = 'NH' then staters = '30';
if stname = 'NJ' then staters = '31'; if stname = 'NM' then staters = '32';
if stname = 'NY' then staters = '33'; if stname = 'NC' then staters = '34';
if stname = 'ND' then staters = '35'; if stname = 'OH' then staters = '36';
if stname = 'OK' then staters = '37'; if stname = 'OR' then staters = '38';
if stname = 'PA' then staters = '39'; if stname = 'RI' then staters = '40';
if stname = 'SC' then staters = '41'; if stname = 'SD' then staters = '42';
if stname = 'TN' then staters = '43'; if stname = 'TX' then staters = '44';
if stname = 'UT' then staters = '45'; if stname = 'VT' then staters = '46';
if stname = 'VA' then staters = '47'; if stname = 'WA' then staters = '48';
if stname = 'WV' then staters = '49'; if stname = 'WI' then staters = '50';
if stname = 'WY' then staters = '51'; 

proc means noprint nway; 
class staters yr sex race;
output out=stpop2 sum(pop)=population;
proc sort data=stpop2; by staters yr sex race;

*** Rates - by merging murder counts and population;
data strates;
merge stmurd2 stpop2;
by staters yr sex race;
if murders=. then murders=0;
murdersper100k=murders/population*100000;
*sample restrictions;
if 1<=staters<=51;
if staters in ('02' '12' '13' '20' '27' '30' '35' '42' '46' '51') then delete; *removing smaller states;
if 1973<=yr<=2002;
proc freq data=strates; tables staters yr sex race;

*** Convert the NCHS codes to FIPS codes;
data stconvert;
input staters $ stfips;
cards; 
01	 1
02	 2
03	 4
04	 5
05	 6
06	 8
07	 9
08	10
09	11
10	12
11	13
12	15
13	16
14	17
15	18
16	19
17	20
18	21
19	22
20	23
21	24
22	25
23	26
24	27
25	28
26	29
27	30
28	31
29	32
30	33
31	34
32	35
33	36
34	37
35	38
36	39
37	40
38	41
39	42
40	44
41	45
42	46
43	47
44	48
45	49
46	50
47	51
48	53
49	54
50	55
51	56
;
proc sort; by staters;

data strates2;
merge stconvert strates;
by staters;
if staters in ('02' '12' '13' '20' '27' '30' '35' '42' '46' '51') then delete; *removing smaller states;
keep yr stfips staters sex race murders murdersper100k population;
rename yr=year;
run;

proc export data=strates2 outfile='C:\Users\tmoore\Desktop\crack\sas\state_murderrate_2024.csv' replace;
run;


**************************************************************
*** MSA-based Murder Rates for Ages 20-24, by Sex and Race ***
**************************************************************;

*** Matching MSAs across the NCHS codes beginning 1989, 1990 and 1994;

******************* 1989 codes *******************; 
data b89;
set mort.nchs_geog89; *county-to-MSA codes;
if cntyfips= 6059 & msafips=360 then msafips=5945;
if cntyfips=18095 & msafips=400 then msafips=3480;
if cntyfips=45007 & msafips=405 then msafips=3160;
if cntyfips=17089 & msafips=620 then msafips=1600;
if cntyfips=17093 & msafips=620 then msafips=1600;
if cntyfips=26025 & msafips=780 then msafips=3720;
if cntyfips=42007 & msafips=845 then msafips=6280;
if cntyfips=12081 & msafips=1140 then msafips=7510;
if cntyfips= 9001 & msafips=1163 then msafips=5483;
if cntyfips=37001 & msafips=1300 then msafips=3120;
if cntyfips=17063 & msafips=3690 then msafips=1600;
if cntyfips=17197 & msafips=3690 then msafips=1600;
if cntyfips=17097 & msafips=3965 then msafips=1600;
if cntyfips=39093 & msafips=4440 then msafips=1680;
if cntyfips=33011 & msafips=4763 then msafips=1123;
if cntyfips=48329 & msafips=5040 then msafips=5800;
if cntyfips=26121 & msafips=5320 then msafips=3000;
if cntyfips=25005 & msafips=5403 then msafips=1123;
if cntyfips=36063 & msafips=5700 then msafips=1280;
if cntyfips=36071 & msafips=5950 then msafips=5660;
if cntyfips= 6111 & msafips=6000 then msafips=8735;
if cntyfips=28059 & msafips=6025 then msafips=0920;
if cntyfips=33015 & msafips=6453 then msafips=1123;
if cntyfips=33017 & msafips=6453 then msafips=1123;
if cntyfips=36027 & msafips=6460 then msafips=2281;
if cntyfips=53011 & msafips=8725 then msafips=6440;
if cntyfips=25027 & msafips=9243 then msafips=1123;

if cntyfips=26017 & msafips= 800 then msafips=6960;
if cntyfips=34025 & msafips=4410 then msafips=5190;
if cntyfips=34023 & msafips=5460 then msafips=5015;
if cntyfips=39089 & msafips=5645 then msafips=1840;
if cntyfips=51073 & msafips=5680 then msafips=5720;
if cntyfips=51095 & msafips=5680 then msafips=5720;
if cntyfips=51199 & msafips=5680 then msafips=5720;
if cntyfips=51650 & msafips=5680 then msafips=5720;
if cntyfips=51700 & msafips=5680 then msafips=5720;
if cntyfips=51735 & msafips=5680 then msafips=5720;
if cntyfips=51830 & msafips=5680 then msafips=5720;
if cntyfips=42069 & msafips=5745 then msafips=7560;
if cntyfips=42079 & msafips=5745 then msafips=7560;
if cntyfips=34031 & msafips=6040 then msafips= 875;
if cntyfips=51053 & msafips=6140 then msafips=6760;
if cntyfips=51149 & msafips=6140 then msafips=6760;
if cntyfips=51570 & msafips=6140 then msafips=6760;
if cntyfips=51670 & msafips=6140 then msafips=6760;
if cntyfips=51730 & msafips=6140 then msafips=6760;
if cntyfips=45091 & msafips=6885 then msafips=1520;
if cntyfips=37025 & msafips=7140 then msafips=1520;
if cntyfips=37159 & msafips=7140 then msafips=1520;
if cntyfips=39023 & msafips=7960 then msafips=2000;
if msafips=0 then delete;
proc sort; by msafips;

* Summing the county codes within an MSA to get an accurate measure of consistency;
proc means noprint nway; 
class msafips;
output out=c89 sum(cntyfips)=sumcnty89;

******************* 1990 codes *******************; 
data b90;
set mort.nchs_geog90; *county-to-MSA codes;
if cntyfips= 6059 & msafips=360 then msafips=5945;
if cntyfips=18095 & msafips=400 then msafips=3480;
if cntyfips=45007 & msafips=405 then msafips=3160;
if cntyfips=17089 & msafips=620 then msafips=1600;
if cntyfips=17093 & msafips=620 then msafips=1600;
if cntyfips=26025 & msafips=780 then msafips=3720;
if cntyfips=42007 & msafips=845 then msafips=6280;
if cntyfips=12081 & msafips=1140 then msafips=7510;
if cntyfips= 9001 & msafips=1163 then msafips=5483;
if cntyfips=37001 & msafips=1300 then msafips=3120;
if cntyfips=17063 & msafips=3690 then msafips=1600;
if cntyfips=17197 & msafips=3690 then msafips=1600;
if cntyfips=17097 & msafips=3965 then msafips=1600;
if cntyfips=39093 & msafips=4440 then msafips=1680;
if cntyfips=33011 & msafips=4763 then msafips=1123;
if cntyfips=48329 & msafips=5040 then msafips=5800;
if cntyfips=26121 & msafips=5320 then msafips=3000;
if cntyfips=25005 & msafips=5403 then msafips=1123;
if cntyfips=36063 & msafips=5700 then msafips=1280;
if cntyfips=36071 & msafips=5950 then msafips=5660;
if cntyfips= 6111 & msafips=6000 then msafips=8735;
if cntyfips=28059 & msafips=6025 then msafips=0920;
if cntyfips=33015 & msafips=6453 then msafips=1123;
if cntyfips=33017 & msafips=6453 then msafips=1123;
if cntyfips=36027 & msafips=6460 then msafips=2281;
if cntyfips=53011 & msafips=8725 then msafips=6440;
if cntyfips=25027 & msafips=9243 then msafips=1123;
if msafips=0 then delete;
proc sort; by msafips;

* Summing the county codes within an MSA to get an accurate measure of consistency;
proc means noprint nway; 
class msafips;
output out=c90 sum(cntyfips)=sumcnty90;

******************* 1994 codes *******************; 
data b94;
set mort.nchs_geog94;
if msafips=0 then delete;
proc sort; by msafips;

* Summing the county codes within an MSA to get an accurate measure of consistency;
proc means noprint nway; 
class msafips;
output out=c94 sum(cntyfips)=sumcnty94;
run;

************************ Exact matches *********************;

data msafile;
merge c89 c90 c94;
by msafips;
if sumcnty89=sumcnty90=sumcnty94; *exact matches;
* these are suppressed throughout the 1989-2000 as the MSAs are too small;
if msafips in (1010 1350 2200 2335 2340 3040 3500 4150 5800 5990 6240 6760 7200 7640 8750) then delete;
* these are lost becuase they are suppressed in 1989;
if msafips in (1020 1260 4080 4100 4243 6015 6820) then delete;
* these are lost because they are suppressed in 1990-2000;
if msafips in (2880 3740 3850) then delete;
fraction=1;
keep msafips fraction;
run;

********* Other MSAs with majority overlap throughout *********;

data msaother;
merge c89 c90 c94;
by msafips;
if sumcnty89=sumcnty90=sumcnty94 then delete; *removing exact matches;
if sumcnty94=. then delete;
nomatch=1;
keep msafips nomatch;

* Identifying number of all deaths by county in each year;
data d89; 
set mort.s89; 
yr=1989; 
cntyfips=fipsctyr/1;
*removing missing counties (defined as '999' in the last three digits);
cntyend=fipsctyr-fipsstr*1000;
if cntyend=999 then delete;
keep yr cntyfips;
proc sort; by cntyfips;
proc means noprint nway; 
class cntyfips;
output out=e89 N(yr)=counts89;

data d90; 
set mort.s90; 
yr=1990; 
cntyfips=fipsctyr/1;
*removing missing counties (defined as '999' in the last three digits);
cntyend=fipsctyr-fipsstr*1000;
if cntyend=999 then delete;
keep yr cntyfips;
proc sort; by cntyfips;
proc means noprint nway; 
class cntyfips;
output out=e90 N(yr)=counts90;

data d94; 
set mort.s94; 
yr=1994; 
cntyfips=fipsctyr/1;
msafips=fipspmsa/1;
cntyend=fipsctyr-fipsstr*1000;
if cntyend=999 then delete;
keep yr msafips cntyfips;
proc sort; by cntyfips;
proc means noprint nway; 
class cntyfips;
output out=e94 N(yr)=counts94;

*** Counties identifiable in each year after 1989;
data f;
merge e89 e90 e94;
by cntyfips;
if counts89=. or counts90=. or counts94=. then delete;
consistent=1;
run;

* Calculating what fraction of these identifiable counties cover MSAs;
* 1) deaths in these consistently identifiable counties that are in MSAs;
data g94a;
merge d94 f;
by cntyfips;
if consistent=1; *counties appearing every year;
if msafips in (0 .) then delete; *removing if not in MSA;
keep msafips cntyfips;
proc sort; by msafips;
proc means noprint nway; 
class msafips;
output out=g94part N(msafips)=msa_part;

* 2) deaths in all of the MSAs;
data g94b;
set mort.s94;
msafips=fipspmsa/1;
cntyfips=fipsctyr/1;
if msafips in (0 .) then delete;
keep msafips cntyfips;
proc sort; by msafips;
proc means noprint nway; 
class msafips;
output out=g94all N(msafips)=msa_all;

* 3) calculating the fractions of coverage of MSAs by the identifiable counties;
data h;
merge msaother g94part g94all;
by msafips;
if nomatch=1;
if msa_part=. then delete;
fraction=msa_part/msa_all; *fraction of deaths in each MSA covered by consisten county id;
if fraction>0.5; *majority of deaths in MSA covered by identifiable counties;
keep msafips nomatch fraction;

* 4) matching back to the counties;
proc sort data=g94a; by cntyfips;
data i;
set g94a;
by cntyfips;
if last.cntyfips=1; *creating a list of counties;
proc sort data=i; by msafips;

proc sort data=h; by msafips;
data j;
merge h i;
by msafips;
if nomatch=1;
keep msafips cntyfips fraction;

*********** Now create a county file for converting to consistent MSAs ************;

data k;
merge msafile b94;
by msafips;
if fraction=1;
keep msafips cntyfips fraction;
run;

data mort.msa_county_conversion;
set j k;
proc sort; by cntyfips;
proc print;
run;


********************** Generating county-level murders *********************;

*** 1) Counts for 1973-1988

* There are a lot of issues related to changes in the way NCHS has coded
	things across the years and also changes to county borders. These are
	fixed in several steps. Note that over 1973-81 the NCHS has it own codes
	but does not use FIPS codes, so we need to attach FIPS codes to the data.;

* Start with 1973-81, as that is the period with no FIPS codes;

* We first set aside Virginia as that needs more direct attention;
data a7381nova;
set a73 a74 a75 a76 a77 a78 a79 a80 a81;
if staters>51 then delete;
if staters=47 then delete; *virginia done differently;
nchs=countyrs/1;
*anchorage fix - changes number at 1982;
 if countyrs= 2010 then nchs= 2002;
*missouri fix - ste. genevieve moved up order and pushed three down;
 if countyrs=26094 then nchs=26095;
 if countyrs=26095 then nchs=26096;
 if countyrs=26096 then nchs=26097;
 if countyrs=26097 then nchs=26094;
*nevada fix - ornsby merged with carson city moved some down one;
 if 29001<=countyrs<=29012 then nchs=countyrs+1;
 if countyrs=29013 then nchs=29001;
*new mexico fix -addition of Cibola in 1981 pushed all down one;
 if 32004<=countyrs<=32032 then nchs=countyrs+1;
proc sort; by nchs;

* The corrected NCHS codes are then merged with FIPS codes for each county;
proc sort data=mort.fips_nchs_conversion; by nchs;
data a7381nova;
 merge a7381nova mort.fips_nchs_conversion;
 by nchs;
run;

* Now focus on Virginia and do direct corrections;
data a7381va;
set a73 a74 a75 a76 a77 a78 a79 a80 a81;
if staters=47;
nchs=countyrs/1;
proc sort; by nchs; 

data va;
input nchs fips;
cards; 
47003   51001
47006	51003
47009	51005
47012	51007
47015	51009
47018	51011
47021	51013
47024	51015
47027	51017
47030	51019
47033	51021
47036	51023
47039	51025
47042	51027
47045	51029
47048	51031
47051	51033
47054	51035
47057	51036
47060	51037
47063	51041
47066	51043
47069	51045
47072	51047
47075	51049
47078	51051
47081	51053
47084	51057
47087	51059
47090	51061
47093	51063
47096	51065
47099	51067
47102	51069
47105	51071
47108	51073
47111	51075
47114	51077
47117	51079
47120	51081
47123	51083
47126	51085
47129	51087
47132	51089
47135	51091
47138	51093
47141	51095
47144	51097
47147	51099
47150	51101
47153	51103
47156	51105
47159	51107
47162	51109
47165	51111
47168	51113
47171	51115
47174	51117
47177	51119
47180	51121
47186	51125
47189	51127
47195	51131
47198	51133
47201	51135
47204	51137
47207	51139
47210	51141
47213	51143
47216	51145
47219	51147
47222	51149
47225	51153
47231	51155
47234	51157
47237	51159
47240	51161
47243	51163
47246	51165
47249	51167
47252	51169
47255	51171
47258	51173
47261	51780
47264	51177
47267	51179
47270	51181
47273	51183
47276	51185
47279	51187
47282	51191
47285	51193
47288	51195
47291	51197
47294	51199
47300	51510
47303	51515
47306	51520
47309	51530
47312	51540
47315	51550
47318	51560
47321	51570
47324	51580
47327	51590
47330	51595
47333	51600
47336	51610
47339	51620
47342	51630
47345	51640
47348	51650
47351	51660
47354	51670
47357	51678
47360	51680
47363	51690
47366	51700
47369	51710
47372	51720
47375	51730
47378	51740
47381	51750
47384	51760
47387	51770
47390	51775
47393	51175
47396	51790
47399	51800
47402	51810
47405	51820
47408	51830
47411	51840
;

data a7381va;
merge a7381va va;
by nchs;

data b7381;
set a7381nova a7381va;
if sex=. then delete;
cntyfips=fips/1;
keep cntyfips yr sex racer3 age;

data a8288;
set a82 a83 a84 a85 a86 a87 a88;
keep cntyfips yr sex racer3 age;
run;

data a7388;
set b7381 a8288;
 if cntyfips in (04012 04027) then cntyfips=04012; 
 if cntyfips in (12025 12086) then cntyfips=12025;  
 if cntyfips in (51019 51031 51680) then cntyfips=51019; 
 if cntyfips in (51053 51149 51730) then cntyfips=51053;
 if cntyfips in (51095 51830) then cntyfips=51095;
 if cntyfips in (51123 51800) then cntyfips=51800;  
 if cntyfips in (51143 51590) then cntyfips=51143;  
 if cntyfips in (51191 51520) then cntyfips=51191;  
 if cntyfips in (51199 51735) then cntyfips=51199;  
proc sort; by cntyfips;

data b7388;
merge mort.msa_county_conversion a7388;
by cntyfips;
if sex=. then delete;
if fraction in (0 .) then delete;
run;

*** 2) Counts for 1989-2002;

data a89b; 
set a89; 
if cntyfips=26017 & msafips= 800 then msafips=6960;
if cntyfips=34025 & msafips=4410 then msafips=5190;
if cntyfips=34023 & msafips=5460 then msafips=5015;
if cntyfips=39089 & msafips=5645 then msafips=1840;
if cntyfips=51073 & msafips=5680 then msafips=5720;
if cntyfips=51095 & msafips=5680 then msafips=5720;
if cntyfips=51199 & msafips=5680 then msafips=5720;
if cntyfips=51650 & msafips=5680 then msafips=5720;
if cntyfips=51700 & msafips=5680 then msafips=5720;
if cntyfips=51735 & msafips=5680 then msafips=5720;
if cntyfips=51830 & msafips=5680 then msafips=5720;
if cntyfips=42069 & msafips=5745 then msafips=7560;
if cntyfips=42079 & msafips=5745 then msafips=7560;
if cntyfips=34031 & msafips=6040 then msafips= 875;
if cntyfips=51053 & msafips=6140 then msafips=6760;
if cntyfips=51149 & msafips=6140 then msafips=6760;
if cntyfips=51570 & msafips=6140 then msafips=6760;
if cntyfips=51670 & msafips=6140 then msafips=6760;
if cntyfips=51730 & msafips=6140 then msafips=6760;
if cntyfips=45091 & msafips=6885 then msafips=1520;
if cntyfips=37025 & msafips=7140 then msafips=1520;
if cntyfips=37159 & msafips=7140 then msafips=1520;
if cntyfips=39023 & msafips=7960 then msafips=2000;
keep cntyfips msafips yr sex racer3 age;

data a8993;
set a89b a90 a91 a92 a93;
if cntyfips= 6059 & msafips= 360 then msafips=5945;
if cntyfips=18095 & msafips= 400 then msafips=3480;
if cntyfips=45007 & msafips= 405 then msafips=3160;
if cntyfips=17089 & msafips= 620 then msafips=1600;
if cntyfips=17093 & msafips= 620 then msafips=1600;
if cntyfips=26025 & msafips= 780 then msafips=3720;
if cntyfips=42007 & msafips= 845 then msafips=6280;
if cntyfips=12081 & msafips=1140 then msafips=7510;
if cntyfips= 9001 & msafips=1163 then msafips=5483;
if cntyfips=37001 & msafips=1300 then msafips=3120;
if cntyfips=17063 & msafips=3690 then msafips=1600;
if cntyfips=17197 & msafips=3690 then msafips=1600;
if cntyfips=17097 & msafips=3965 then msafips=1600;
if cntyfips=39093 & msafips=4440 then msafips=1680;
if cntyfips=33011 & msafips=4763 then msafips=1123;
if cntyfips=48329 & msafips=5040 then msafips=5800;
if cntyfips=26121 & msafips=5320 then msafips=3000;
if cntyfips=25005 & msafips=5403 then msafips=1123;
if cntyfips=36063 & msafips=5700 then msafips=1280;
if cntyfips=36071 & msafips=5950 then msafips=5660;
if cntyfips= 6111 & msafips=6000 then msafips=8735;
if cntyfips=28059 & msafips=6025 then msafips=0920;
if cntyfips=33015 & msafips=6453 then msafips=1123;
if cntyfips=33017 & msafips=6453 then msafips=1123;
if cntyfips=36027 & msafips=6460 then msafips=2281;
if cntyfips=53011 & msafips=8725 then msafips=6440;
if cntyfips=25027 & msafips=9243 then msafips=1123;

data a8902;
set a8993 a94 a95 a96 a97 a98 a99 a00 a01 a02;
if cntyfips in (04012 04027) then cntyfips=04012; 
if cntyfips in (12025 12086) then cntyfips=12025;  
if cntyfips in (51019 51031 51680) then cntyfips=51019; 
if cntyfips in (51053 51149 51730) then cntyfips=51053;
if cntyfips in (51095 51830) then cntyfips=51095;
if cntyfips in (51123 51800) then cntyfips=51800;  
if cntyfips in (51143 51590) then cntyfips=51143;  
if cntyfips in (51191 51520) then cntyfips=51191;  
if cntyfips in (51199 51735) then cntyfips=51199;  
proc sort; by msafips;

* now merging the identifiers - first the MSA files;
data b8902;
merge msafile a8902;
by msafips;
if fraction=. then delete;
if sex=. then delete;
run;

* now merging the county files;
data c8902;
set a8902;
drop msafips;
proc sort; by cntyfips;
proc sort data=mort.msa_county_conversion; by cntyfips;
data d8902;
merge mort.msa_county_conversion c8902;
by cntyfips;
if fraction=. then delete;
if sex=. then delete;

************* Putting all of the years together ***************;
data a7302;
set b7388 b8902 d8902;
* race recoding;
if racer3=1 then race=1; *white;
if 1973<=yr<=1978 & racer3=2 then race=2; *black;
if 1979<=yr<=2002 & racer3=3 then race=2; *black;
if 1973<=yr<=1978 & racer3=3 then race=3; *other race;
if 1979<=yr<=2002 & racer3=2 then race=3; *other race;
* age recoding;
*age recoding to create age groups;
agegrp=14;
if age=999 then agegrp=.; *missing values;
if (15<=age<=19) then agegrp=1519;
if (20<=age<=24) then agegrp=2024;
if (25<=age<=39) then agegrp=2539;
if (40<=age<=150) then agegrp=4099;
* MSA consolidations to match with the PUMS data;
if msafips=1145 then msafips=3360; *Brazier is assigned to Houston;
if msafips=5775 then msafips=7360; *Oakland is assigned to San Francisco;
if msafips=2800 then msafips=1920; *Fort Worth is assigned to Dallas;

keep yr msafips sex race agegrp;
proc sort; by msafips yr;
run;

* Create murder rates that are the primary independent variable of interest 
for Tables 5, 6 and A7;

*** Murder counts by state of residence, year, sex and race;
data msamurd1;
set a7302;
if agegrp=2024; *age restriction of 20-24;
if sex in (1 2);
if race in (1 2);

proc means noprint nway; 
class msafips yr sex race;
output out=msamurd2 N(yr)=murders;
proc sort data=msamurd2; by msafips yr sex race;
run;

*** Population counts;

data msapop;
set pop2;
*combining the 15-19 and 20-24 groups to 15-24;
if agegrp=1519 then agegrp=1524;
if agegrp=2024 then agegrp=1524;
if agegrp=1524;
*sample restrictions;
if sex in (1 2);
if race in (1 2);
*corrections to county codes;
if cntyfips in (04012 04027) then cntyfips=04012; 
if cntyfips in (12025 12086) then cntyfips=12025;  
if cntyfips in (51019 51031 51680) then cntyfips=51019; 
if cntyfips in (51053 51149 51730) then cntyfips=51053;
if cntyfips in (51095 51830) then cntyfips=51095;
if cntyfips in (51123 51800) then cntyfips=51800;  
if cntyfips in (51143 51590) then cntyfips=51143;  
if cntyfips in (51191 51520) then cntyfips=51191;  
if cntyfips in (51199 51735) then cntyfips=51199;  
* corrections getting SEER data more consistent;
if cntyfips=8911 then cntyfips=8911;
if cntyfips=8013 then cntyfips=8912;
if cntyfips=8059 then cntyfips=8913;
if cntyfips=8123 then cntyfips=8914;
if cntyfips=36910 then cntyfips=36005;
if cntyfips in (51013 51059 51510 51600 51610 51918) then cntyfips=51013;
if cntyfips in (51153 51683 51685 51910) then cntyfips=51510;
proc sort; by cntyfips;

* merging MSA codes into the county data;
proc sort data=mort.msa_county_conversion; by cntyfips;
data msapop1;
merge mort.msa_county_conversion msapop1;
by cntyfips;
* MSA consolidations to match with the PUMS data;
if msafips=1145 then msafips=3360; *Brazier is assigned to Houston;
if msafips=5775 then msafips=7360; *Oakland is assigned to San Francisco;
if msafips=2800 then msafips=1920; *Fort Worth is assigned to Dallas;

proc means noprint nway; 
class msafips yr sex race;
output out=msapop2 sum(pop)=population;
proc sort data=msapop2; by msafips yr sex race;

*** Rates - by merging murder counts and population;
data msarates;
merge msamurd2 msapop2;
by staters yr sex race;
if murders=. then murders=0;
murdersper100k=murders/population*100000;
keep yr msafips sex race murders murdersper100k population;
rename yr=year;
proc freq data=msarates; tables msafips yr sex race;

proc export data=msarates outfile='C:\Users\tmoore\Desktop\crack\sas\msa_murderrate_2024.csv' replace;
run;


*****************************************************
*** Identifying Cocaine-related Deaths, 1973-1998 ***
*****************************************************;

* Identify deaths where cocaine-related conditions are included as any cause of death in ICD-8 and ICD-9. 
	We keep information on geographic identifiers;
*** Inputting the mortality data;
data a79; set mort.s79; yr=1979; 
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc;
data a80; set mort.s80; yr=1980; 
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end; 
	if cocaine=1; keep yr stateoc countyoc;
data a81; set mort.s81; yr=1981; 
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end; 
	if cocaine=1; keep yr stateoc countyoc fipsctyo;
data a82; set mort.s82; yr=1982; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a83; set mort.s83; yr=1983; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a84; set mort.s84; yr=1984; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a85; set mort.s85; yr=1985; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a86; set mort.s86; yr=1986; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a87; set mort.s87; yr=1987; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a88; set mort.s88; yr=1988; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips;
data a89; set mort.s89; yr=1989; msafipsr=fipssmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a90; set mort.s90; yr=1990; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('30420' '30560') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a91; set mort.s91; yr=1991; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('03042' '03056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a92; set mort.s92; yr=1992; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('03042' '03056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a93; set mort.s93; yr=1993; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('03042' '03056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a94; set mort.s94; yr=1994; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('03042' '03056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a95; set mort.s95; yr=1995; msafipsr=fipspmsa/1; cntyfips=fipsctyo/1;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('03042' '03056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a96; set mort.s96; yr=1996; msafipsr=fipspmsa/1; cntyfips=fipssto*1000 + fipscty;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('3042' '3056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a97; set mort.s97; yr=1997; msafipsr=fipspmsa/1; cntyfips=fipssto*1000 + fipscty;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('3042' '3056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
data a98; set mort.s98; yr=1998; msafipsr=fipspmsa/1; cntyfips=fipssto*1000 + fipscty;
	array cause(18) record_1-record_18; do i = 1 to 18; if cause(i) in ('3042' '3056') then cocaine=1; end;
	if cocaine=1; keep yr stateoc countyoc cntyfips msafipsr;
run;

*** 1) Counts for 1973-1988

* There are a lot of issues related to changes in the way NCHS has coded
	things across the years and also changes to county borders. These are
	fixed in several steps. Note that over 1973-81 the NCHS has it own codes
	but does not use FIPS codes, so we need to attach FIPS codes to the data.;

* Start with 1973-81, as that is the period with no FIPS codes;

* We first set aside Virginia as that needs more direct attention;
data a7381nova;
set a79 a80 a81;
if stateoc=47 then delete; *virginia done differently;
nchs=countyoc/1;
*anchorage fix - changes number at 1982;
 if countyoc= 2010 then nchs= 2002;
*missouri fix - ste. genevieve moved up order and pushed three down;
 if countyoc=26094 then nchs=26095;
 if countyoc=26095 then nchs=26096;
 if countyoc=26096 then nchs=26097;
 if countyoc=26097 then nchs=26094;
*nevada fix - ornsby merged with carson city moved some down one;
 if 29001<=countyoc<=29012 then nchs=countyoc+1;
 if countyoc=29013 then nchs=29001;
*new mexico fix -addition of Cibola in 1981 pushed all down one;
 if 32004<=countyoc<=32032 then nchs=countyoc+1;
proc sort; by nchs;

* The corrected NCHS codes are then merged with FIPS codes for each county;
proc sort data=mort.fips_nchs_conversion; by nchs;
data a7381nova;
 merge a7381nova mort.fips_nchs_conversion;
 by nchs;
run;

* Now focus on Virginia and do direct corrections;
data a7381va;
set a73 a74 a75 a76 a77 a78 a79 a80 a81;
if stateoc=47;
nchs=countyoc/1;
proc sort; by nchs; 

data va;
input nchs fips;
cards; 
47003   51001
47006	51003
47009	51005
47012	51007
47015	51009
47018	51011
47021	51013
47024	51015
47027	51017
47030	51019
47033	51021
47036	51023
47039	51025
47042	51027
47045	51029
47048	51031
47051	51033
47054	51035
47057	51036
47060	51037
47063	51041
47066	51043
47069	51045
47072	51047
47075	51049
47078	51051
47081	51053
47084	51057
47087	51059
47090	51061
47093	51063
47096	51065
47099	51067
47102	51069
47105	51071
47108	51073
47111	51075
47114	51077
47117	51079
47120	51081
47123	51083
47126	51085
47129	51087
47132	51089
47135	51091
47138	51093
47141	51095
47144	51097
47147	51099
47150	51101
47153	51103
47156	51105
47159	51107
47162	51109
47165	51111
47168	51113
47171	51115
47174	51117
47177	51119
47180	51121
47186	51125
47189	51127
47195	51131
47198	51133
47201	51135
47204	51137
47207	51139
47210	51141
47213	51143
47216	51145
47219	51147
47222	51149
47225	51153
47231	51155
47234	51157
47237	51159
47240	51161
47243	51163
47246	51165
47249	51167
47252	51169
47255	51171
47258	51173
47261	51780
47264	51177
47267	51179
47270	51181
47273	51183
47276	51185
47279	51187
47282	51191
47285	51193
47288	51195
47291	51197
47294	51199
47300	51510
47303	51515
47306	51520
47309	51530
47312	51540
47315	51550
47318	51560
47321	51570
47324	51580
47327	51590
47330	51595
47333	51600
47336	51610
47339	51620
47342	51630
47345	51640
47348	51650
47351	51660
47354	51670
47357	51678
47360	51680
47363	51690
47366	51700
47369	51710
47372	51720
47375	51730
47378	51740
47381	51750
47384	51760
47387	51770
47390	51775
47393	51175
47396	51790
47399	51800
47402	51810
47405	51820
47408	51830
47411	51840
;

data a7381va;
merge a7381va va;
by nchs;

data b7381;
set a7381nova a7381va;
cntyfips=fips/1;
keep cntyfips yr;

data a8288;
set a82 a83 a84 a85 a86 a87 a88;
keep cntyfips yr;
run;

data a7388;
set b7381 a8288;
 if cntyfips in (04012 04027) then cntyfips=04012; 
 if cntyfips in (12025 12086) then cntyfips=12025;  
 if cntyfips in (51019 51031 51680) then cntyfips=51019; 
 if cntyfips in (51053 51149 51730) then cntyfips=51053;
 if cntyfips in (51095 51830) then cntyfips=51095;
 if cntyfips in (51123 51800) then cntyfips=51800;  
 if cntyfips in (51143 51590) then cntyfips=51143;  
 if cntyfips in (51191 51520) then cntyfips=51191;  
 if cntyfips in (51199 51735) then cntyfips=51199;  
proc sort; by cntyfips;

* Merging with the conversion of MSA to counties;
data b7388;
merge mort.msa_county_conversion a7388;
by cntyfips;
if yr=. then delete;
if msafips=. then delete;
keep msafips yr;
run;

* Years 1989-1998;
data b8998;
set a89 a90 a91 a92 a93 a94 a95 a96 a97 a98;
keep msafipsr yr;
rename msafipsr=msafips;

*** Combining the years;
data a7398;
set b7388 b8998;
if msafips=1145 then msafips=3360; *Brazier is assigned to Houston;
if msafips=5775 then msafips=7360; *Oakland is assigned to San Francisco;
if msafips=2800 then msafips=1920; *Fort Worth is assigned to Dallas;
proc sort; by msafips yr;

* Aggregating annually at the MSA level;
proc means noprint nway; 
class msafips yr;
output out=b7398 n(yr)=deaths; 
data c7398;
set b7398;
keep msafips yr deaths;
run;

*********************************************************************************
* Getting the full set of possible MSA observations so that the zeroes are added;
data fullset;
set mort.codes1994;
if msafips=1145 then msafips=3360; *Brazier is assigned to Houston;
if msafips=5775 then msafips=7360; *Oakland is assigned to San Francisco;
if msafips=2800 then msafips=1920; *Fort Worth is assigned to Dallas;
proc sort; by msafips;

data fullset2;
set fullset;
by msafips;
if last.msafips=1; *retaining one observation for each MSA;
keep msafips;

data fullset3;
set fullset2;
do yr=1973 to 1998; *creating an MSA observation for every year;
output; end;
proc sort; by msafips yr;

data d7398;
merge fullset3 c7398;
by msafips yr;
if deaths=. then deaths=0;

proc transpose data=d7398 out=e7398 prefix=yr;
by msafips;
id yr;
var deaths;
run;
* Calculating the crack cocaine entry dates from 1973;
data f7398;
set e7398;
array Ayr(20) yr1979-yr1998;
array Atwo_in_row(20) two_in_row1979-two_in_row1998;
array Atwo_or_more(20) two_or_more1979-two_or_more1998;
array Atwo_in_three(20) two_in_three1979-two_in_three1998;
array Athree_in_row(20) three_in_row1979-three_in_row1998;

* Main measure - crack cocaine deaths two years in a row;
do i = 1 to 19;
 if Ayr(i)>0 & Ayr(i + 1)>0 then Atwo_in_row(i)=1;
 else Atwo_in_row(i)=0;
end; drop i;
	 if two_in_row1979=1 then two_in_row=1979;
else if two_in_row1980=1 then two_in_row=1980;
else if two_in_row1981=1 then two_in_row=1981;
else if two_in_row1982=1 then two_in_row=1982;
else if two_in_row1983=1 then two_in_row=1983;
else if two_in_row1984=1 then two_in_row=1984;
else if two_in_row1985=1 then two_in_row=1985;
else if two_in_row1986=1 then two_in_row=1986;
else if two_in_row1987=1 then two_in_row=1987;
else if two_in_row1988=1 then two_in_row=1988;
else if two_in_row1989=1 then two_in_row=1989;
else if two_in_row1990=1 then two_in_row=1990;
else if two_in_row1991=1 then two_in_row=1991;
else if two_in_row1992=1 then two_in_row=1992;
else if two_in_row1993=1 then two_in_row=1993;
else if two_in_row1994=1 then two_in_row=1994;
else if two_in_row1995=1 then two_in_row=1995;
else if two_in_row1996=1 then two_in_row=1996;
else if two_in_row1997=1 then two_in_row=1997;
else if two_in_row1998=1 then two_in_row=1998;

* Alternate measure - more than two deaths in a year;
do i = 1 to 19;
 if Ayr(i)>1 then Atwo_or_more(i)=1;
 else Atwo_or_more(i)=0;
end; drop i;
	 if two_or_more1979=1 then two_or_more=1979;
else if two_or_more1980=1 then two_or_more=1980;
else if two_or_more1981=1 then two_or_more=1981;
else if two_or_more1982=1 then two_or_more=1982;
else if two_or_more1983=1 then two_or_more=1983;
else if two_or_more1984=1 then two_or_more=1984;
else if two_or_more1985=1 then two_or_more=1985;
else if two_or_more1986=1 then two_or_more=1986;
else if two_or_more1987=1 then two_or_more=1987;
else if two_or_more1988=1 then two_or_more=1988;
else if two_or_more1989=1 then two_or_more=1989;
else if two_or_more1990=1 then two_or_more=1990;
else if two_or_more1991=1 then two_or_more=1991;
else if two_or_more1992=1 then two_or_more=1992;
else if two_or_more1993=1 then two_or_more=1993;
else if two_or_more1994=1 then two_or_more=1994;
else if two_or_more1995=1 then two_or_more=1995;
else if two_or_more1996=1 then two_or_more=1996;
else if two_or_more1997=1 then two_or_more=1997;
else if two_or_more1998=1 then two_or_more=1998;

* Alternate measure - deaths in at least two years out of three;
do i = 1 to 18;
 if Ayr(i)>0 & Ayr(i + 1)>0 then Atwo_in_three(i)=1;
 else if Ayr(i)>0 & Ayr(i + 2)>0 then Atwo_in_three(i)=1;
 else Atwo_in_three(i)=0;
end; drop i;
	 if two_in_three1979=1 then two_in_three=1979;
else if two_in_three1980=1 then two_in_three=1980;
else if two_in_three1981=1 then two_in_three=1981;
else if two_in_three1982=1 then two_in_three=1982;
else if two_in_three1983=1 then two_in_three=1983;
else if two_in_three1984=1 then two_in_three=1984;
else if two_in_three1985=1 then two_in_three=1985;
else if two_in_three1986=1 then two_in_three=1986;
else if two_in_three1987=1 then two_in_three=1987;
else if two_in_three1988=1 then two_in_three=1988;
else if two_in_three1989=1 then two_in_three=1989;
else if two_in_three1990=1 then two_in_three=1990;
else if two_in_three1991=1 then two_in_three=1991;
else if two_in_three1992=1 then two_in_three=1992;
else if two_in_three1993=1 then two_in_three=1993;
else if two_in_three1994=1 then two_in_three=1994;
else if two_in_three1995=1 then two_in_three=1995;
else if two_in_three1996=1 then two_in_three=1996;
else if two_in_three1997=1 then two_in_three=1997;
else if two_in_three1998=1 then two_in_three=1998;

* Alternate measure - deaths in more than three years in a row;
do i = 1 to 18;
 if Ayr(i)>0 & Ayr(i + 1)>0 & Ayr(i + 2)>0 then Athree_in_row(i)=1;
 else Athree_in_row(i)=0;
end; drop i;
	 if three_in_row1979=1 then three_in_row=1979;
else if three_in_row1980=1 then three_in_row=1980;
else if three_in_row1981=1 then three_in_row=1981;
else if three_in_row1982=1 then three_in_row=1982;
else if three_in_row1983=1 then three_in_row=1983;
else if three_in_row1984=1 then three_in_row=1984;
else if three_in_row1985=1 then three_in_row=1985;
else if three_in_row1986=1 then three_in_row=1986;
else if three_in_row1987=1 then three_in_row=1987;
else if three_in_row1988=1 then three_in_row=1988;
else if three_in_row1989=1 then three_in_row=1989;
else if three_in_row1990=1 then three_in_row=1990;
else if three_in_row1991=1 then three_in_row=1991;
else if three_in_row1992=1 then three_in_row=1992;
else if three_in_row1993=1 then three_in_row=1993;
else if three_in_row1994=1 then three_in_row=1994;
else if three_in_row1995=1 then three_in_row=1995;
else if three_in_row1996=1 then three_in_row=1996;
else if three_in_row1997=1 then three_in_row=1997;
else if three_in_row1998=1 then three_in_row=1998;

keep msafips two_in_row two_or_more two_in_three three_in_row;
run;

*** Merging in MSA names and population ***;
proc sort data=mort.msa_names; by msafips;
proc sort data=f7398; by msafips;

data mort.msa_start_dates;
merge mort.msa_names f7398;
by msafips;
if msafips=0 then delete;
if msaname=' ' then delete;
if two_in_row<1982 then delete;

proc print; title 'Table 2 - MSA crack intro - showing different measures';
run;

***********************************************************
*********** State-based crack arrival dates ***************
***********************************************************;

data a7998;
set a79 a80 a81 a82 a83 a84 a85 a86 a87 a88 a89 a90 a91 a92 a93 a94 a95 a96 a97 a98;
keep stateoc yr;
proc sort; by stateoc yr;

* Aggregating annually at the MSA level;
proc means noprint nway; 
class stateoc yr;
output out=b7998 n(yr)=deaths; 
data c7998;
set b7998;
state=stateoc/1;
keep state yr deaths;
run;

*********************************************************************************
* Getting the full set of possible MSA observations so that the zeroes are added;

data fullset;
do state=1 to 51; *creating a state observation for every year;
do yr=1979 to 1997; *creating a state observation for every year;
output; end;
output; end;
proc sort; by state yr;

data d7998;
merge fullset c7998;
by state yr;
if deaths=. then deaths=0;

proc transpose data=d7998 out=e7998 prefix=yr;
by state;
id yr;
var deaths;
run;

* Calculating the crack cocaine entry dates;
data f7998;
set e7998;
array Ayr(20) yr1979-yr1998;
array Atwo_in_row(20) two_in_row1979-two_in_row1998;
array Atwo_or_more(20) two_or_more1979-two_or_more1998;
array Atwo_in_three(20) two_in_three1979-two_in_three1998;
array Atwo_in_three2(20) two_in_three21979-two_in_three21998;
array Athree_in_row(20) three_in_row1979-three_in_row1998;
array Athree_in_row2(20) three_in_row21979-three_in_row21998;

* Main measure - crack cocaine deaths two years in a row;
do i = 1 to 19;
 if Ayr(i)>0 & Ayr(i + 1)>0 then Atwo_in_row(i)=1;
 else Atwo_in_row(i)=0;
end; drop i;
	 if two_in_row1979=1 then two_in_row=1979;
else if two_in_row1980=1 then two_in_row=1980;
else if two_in_row1981=1 then two_in_row=1981;
else if two_in_row1982=1 then two_in_row=1982;
else if two_in_row1983=1 then two_in_row=1983;
else if two_in_row1984=1 then two_in_row=1984;
else if two_in_row1985=1 then two_in_row=1985;
else if two_in_row1986=1 then two_in_row=1986;
else if two_in_row1987=1 then two_in_row=1987;
else if two_in_row1988=1 then two_in_row=1988;
else if two_in_row1989=1 then two_in_row=1989;
else if two_in_row1990=1 then two_in_row=1990;
else if two_in_row1991=1 then two_in_row=1991;
else if two_in_row1992=1 then two_in_row=1992;
else if two_in_row1993=1 then two_in_row=1993;
else if two_in_row1994=1 then two_in_row=1994;
else if two_in_row1995=1 then two_in_row=1995;
else if two_in_row1996=1 then two_in_row=1996;
else if two_in_row1997=1 then two_in_row=1997;
else if two_in_row1998=1 then two_in_row=1998;

* Measure for large states - more than two deaths in a year;
do i = 1 to 19;
 if Ayr(i)>1 then Atwo_or_more(i)=1;
 else Atwo_or_more(i)=0;
end; drop i;
	 if two_or_more1979=1 then two_or_more=1979;
else if two_or_more1980=1 then two_or_more=1980;
else if two_or_more1981=1 then two_or_more=1981;
else if two_or_more1982=1 then two_or_more=1982;
else if two_or_more1983=1 then two_or_more=1983;
else if two_or_more1984=1 then two_or_more=1984;
else if two_or_more1985=1 then two_or_more=1985;
else if two_or_more1986=1 then two_or_more=1986;
else if two_or_more1987=1 then two_or_more=1987;
else if two_or_more1988=1 then two_or_more=1988;
else if two_or_more1989=1 then two_or_more=1989;
else if two_or_more1990=1 then two_or_more=1990;
else if two_or_more1991=1 then two_or_more=1991;
else if two_or_more1992=1 then two_or_more=1992;
else if two_or_more1993=1 then two_or_more=1993;
else if two_or_more1994=1 then two_or_more=1994;
else if two_or_more1995=1 then two_or_more=1995;
else if two_or_more1996=1 then two_or_more=1996;
else if two_or_more1997=1 then two_or_more=1997;
else if two_or_more1998=1 then two_or_more=1998;

* Alternate measure - deaths in at least two years out of three;
do i = 1 to 18;
 if Ayr(i)>0 & Ayr(i + 1)>0 then Atwo_in_three(i)=1;
 else if Ayr(i)>0 & Ayr(i + 2)>0 then Atwo_in_three(i)=1;
 else Atwo_in_three(i)=0;
end; drop i;
	 if two_in_three1979=1 then two_in_three=1979;
else if two_in_three1980=1 then two_in_three=1980;
else if two_in_three1981=1 then two_in_three=1981;
else if two_in_three1982=1 then two_in_three=1982;
else if two_in_three1983=1 then two_in_three=1983;
else if two_in_three1984=1 then two_in_three=1984;
else if two_in_three1985=1 then two_in_three=1985;
else if two_in_three1986=1 then two_in_three=1986;
else if two_in_three1987=1 then two_in_three=1987;
else if two_in_three1988=1 then two_in_three=1988;
else if two_in_three1989=1 then two_in_three=1989;
else if two_in_three1990=1 then two_in_three=1990;
else if two_in_three1991=1 then two_in_three=1991;
else if two_in_three1992=1 then two_in_three=1992;
else if two_in_three1993=1 then two_in_three=1993;
else if two_in_three1994=1 then two_in_three=1994;
else if two_in_three1995=1 then two_in_three=1995;
else if two_in_three1996=1 then two_in_three=1996;
else if two_in_three1997=1 then two_in_three=1997;
else if two_in_three1998=1 then two_in_three=1998;

* Alternate measure for large states - two deaths in at least two years out of three;
do i = 1 to 18;
 if Ayr(i)>1 & Ayr(i + 1)>1 then Atwo_in_three2(i)=1;
 else if Ayr(i)>1 & Ayr(i + 2)>1 then Atwo_in_three2(i)=1;
 else Atwo_in_three2(i)=0;
end; drop i;
	 if two_in_three21979=1 then two_in_three2=1979;
else if two_in_three21980=1 then two_in_three2=1980;
else if two_in_three21981=1 then two_in_three2=1981;
else if two_in_three21982=1 then two_in_three2=1982;
else if two_in_three21983=1 then two_in_three2=1983;
else if two_in_three21984=1 then two_in_three2=1984;
else if two_in_three21985=1 then two_in_three2=1985;
else if two_in_three21986=1 then two_in_three2=1986;
else if two_in_three21987=1 then two_in_three2=1987;
else if two_in_three21988=1 then two_in_three2=1988;
else if two_in_three21989=1 then two_in_three2=1989;
else if two_in_three21990=1 then two_in_three2=1990;
else if two_in_three21991=1 then two_in_three2=1991;
else if two_in_three21992=1 then two_in_three2=1992;
else if two_in_three21993=1 then two_in_three2=1993;
else if two_in_three21994=1 then two_in_three2=1994;
else if two_in_three21995=1 then two_in_three2=1995;
else if two_in_three21996=1 then two_in_three2=1996;
else if two_in_three21997=1 then two_in_three2=1997;
else if two_in_three21998=1 then two_in_three2=1998;

* Alternate measure - deaths in more than three years in a row;
do i = 1 to 18;
 if Ayr(i)>0 & Ayr(i + 1)>0 & Ayr(i + 2)>0 then Athree_in_row(i)=1;
 else Athree_in_row(i)=0;
end; drop i;
	 if three_in_row1979=1 then three_in_row=1979;
else if three_in_row1980=1 then three_in_row=1980;
else if three_in_row1981=1 then three_in_row=1981;
else if three_in_row1982=1 then three_in_row=1982;
else if three_in_row1983=1 then three_in_row=1983;
else if three_in_row1984=1 then three_in_row=1984;
else if three_in_row1985=1 then three_in_row=1985;
else if three_in_row1986=1 then three_in_row=1986;
else if three_in_row1987=1 then three_in_row=1987;
else if three_in_row1988=1 then three_in_row=1988;
else if three_in_row1989=1 then three_in_row=1989;
else if three_in_row1990=1 then three_in_row=1990;
else if three_in_row1991=1 then three_in_row=1991;
else if three_in_row1992=1 then three_in_row=1992;
else if three_in_row1993=1 then three_in_row=1993;
else if three_in_row1994=1 then three_in_row=1994;
else if three_in_row1995=1 then three_in_row=1995;
else if three_in_row1996=1 then three_in_row=1996;
else if three_in_row1997=1 then three_in_row=1997;
else if three_in_row1998=1 then three_in_row=1998;

* Alternate measure for large states - deaths in more than three years in a row;
do i = 1 to 18;
 if Ayr(i)>1 & Ayr(i + 1)>1 & Ayr(i + 2)>1 then Athree_in_row2(i)=1;
 else Athree_in_row2(i)=0;
end; drop i;
	 if three_in_row21979=1 then three_in_row2=1979;
else if three_in_row21980=1 then three_in_row2=1980;
else if three_in_row21981=1 then three_in_row2=1981;
else if three_in_row21982=1 then three_in_row2=1982;
else if three_in_row21983=1 then three_in_row2=1983;
else if three_in_row21984=1 then three_in_row2=1984;
else if three_in_row21985=1 then three_in_row2=1985;
else if three_in_row21986=1 then three_in_row2=1986;
else if three_in_row21987=1 then three_in_row2=1987;
else if three_in_row21988=1 then three_in_row2=1988;
else if three_in_row21989=1 then three_in_row2=1989;
else if three_in_row21990=1 then three_in_row2=1990;
else if three_in_row21991=1 then three_in_row2=1991;
else if three_in_row21992=1 then three_in_row2=1992;
else if three_in_row21993=1 then three_in_row2=1993;
else if three_in_row21994=1 then three_in_row2=1994;
else if three_in_row21995=1 then three_in_row2=1995;
else if three_in_row21996=1 then three_in_row2=1996;
else if three_in_row21997=1 then three_in_row2=1997;
else if three_in_row21998=1 then three_in_row2=1998;

keep state two_in_row two_or_more two_in_three two_in_three2 three_in_row three_in_row2;
run;

* Calculating the crack cocaine entry dates;
data g7998;
set f7998;
* matching the population data to NCHS codes;
if state =  1 then stname = 'AL'; if state =  2 then stname = 'AK';
if state =  3 then stname = 'AZ'; if state =  4 then stname = 'AR';
if state =  5 then stname = 'CA'; if state =  6 then stname = 'CO';
if state =  7 then stname = 'CT'; if state =  8 then stname = 'DE';
if state =  9 then stname = 'DC'; if state = 10 then stname = 'FL';
if state = 11 then stname = 'GA'; if state = 12 then stname = 'HI';
if state = 13 then stname = 'ID'; if state = 14 then stname = 'IL';
if state = 15 then stname = 'IN'; if state = 16 then stname = 'IA';
if state = 17 then stname = 'KS'; if state = 18 then stname = 'KY';
if state = 19 then stname = 'LA'; if state = 20 then stname = 'ME';
if state = 21 then stname = 'MD'; if state = 22 then stname = 'MA';
if state = 23 then stname = 'MI'; if state = 24 then stname = 'MN';
if state = 25 then stname = 'MS'; if state = 26 then stname = 'MO';
if state = 27 then stname = 'MT'; if state = 28 then stname = 'NE';
if state = 29 then stname = 'NV'; if state = 30 then stname = 'NH';
if state = 31 then stname = 'NJ'; if state = 32 then stname = 'NM';
if state = 33 then stname = 'NY'; if state = 34 then stname = 'NC';
if state = 35 then stname = 'ND'; if state = 36 then stname = 'OH';
if state = 37 then stname = 'OK'; if state = 38 then stname = 'OR';
if state = 39 then stname = 'PA'; if state = 40 then stname = 'RI';
if state = 41 then stname = 'SC'; if state = 42 then stname = 'SD';
if state = 43 then stname = 'TN'; if state = 44 then stname = 'TX';
if state = 45 then stname = 'UT'; if state = 46 then stname = 'VT';
if state = 47 then stname = 'VA'; if state = 48 then stname = 'WA';
if state = 49 then stname = 'WV'; if state = 50 then stname = 'WI';
if state = 51 then stname = 'WY'; 
proc export data=g7998 outfile='C:\Users\tmoore\Desktop\crack\sas\state_crack_entry.csv' replace;
run;
